Goto

Collaborating Authors

 docker container




Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization

Papenmeier, Leonard, Nardi, Luigi

arXiv.org Artificial Intelligence

We present Bencher, a modular benchmarking framework for black-box optimization that fundamentally decouples benchmark execution from optimization logic. Unlike prior suites that focus on combining many benchmarks in a single project, Bencher introduces a clean abstraction boundary: each benchmark is isolated in its own virtual Python environment and accessed via a unified, version-agnostic remote procedure call (RPC) interface. This design eliminates dependency conflicts and simplifies the integration of diverse, real-world benchmarks, which often have complex and conflicting software requirements. Bencher can be deployed locally or remotely via Docker or on high-performance computing (HPC) clusters via Singularity, providing a containerized, reproducible runtime for any benchmark. Its lightweight client requires minimal setup and supports drop-in evaluation of 80 benchmarks across continuous, categorical, and binary domains.


An LLM-based Agent for Reliable Docker Environment Configuration

Hu, Ruida, Peng, Chao, Wang, Xinchen, Gao, Cuiyun

arXiv.org Artificial Intelligence

Environment configuration is a critical yet time-consuming step in software development, especially when dealing with unfamiliar code repositories. While Large Language Models (LLMs) demonstrate the potential to accomplish software engineering tasks, existing methods for environment configuration often rely on manual efforts or fragile scripts, leading to inefficiencies and unreliable outcomes. We introduce Repo2Run, the first LLM-based agent designed to fully automate environment configuration and generate executable Dockerfiles for arbitrary Python repositories. We address two major challenges: (1) enabling the LLM agent to configure environments within isolated Docker containers, and (2) ensuring the successful configuration process is recorded and accurately transferred to a Dockerfile without error. To achieve this, we propose atomic configuration synthesis, featuring a dual-environment architecture (internal and external environment) with a rollback mechanism to prevent environment "pollution" from failed commands, guaranteeing atomic execution (execute fully or not at all) and a Dockerfile generator to transfer successful configuration steps into runnable Dockerfiles. We evaluate Repo2Run on our proposed benchmark of 420 recent Python repositories with unit tests, where it achieves an 86.0%


A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems

Radulov, Nikola, Zhang, Yuhao, Bujanca, Mihai, Ye, Ruiqi, Luján, Mikel

arXiv.org Artificial Intelligence

We propose SLAMFuse, an open-source SLAM benchmarking framework that provides consistent crossplatform environments for evaluating multi-modal SLAM algorithms, along with tools for data fuzzing, failure detection, and diagnosis across different datasets. Our framework introduces a fuzzing mechanism to test the resilience of SLAM algorithms against dataset perturbations. This enables the assessment of pose estimation accuracy under varying conditions and identifies critical perturbation thresholds. SLAMFuse improves diagnostics with failure detection and analysis tools, examining algorithm behaviour against dataset characteristics. SLAMFuse uses Docker to ensure reproducible testing conditions across diverse datasets and systems by streamlining dependency management. Emphasizing the importance of reproducibility and introducing advanced tools for algorithm evaluation and performance diagnosis, our work sets a new precedent for reliable benchmarking of SLAM systems. We provide ready-to-use docker compatible versions of the algorithms and datasets used in the experiments, together with guidelines for integrating and benchmarking new algorithms. Code is available at https://github.com/nikolaradulov/slamfuse


NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Shao, Minghao, Jancheska, Sofija, Udeshi, Meet, Dolan-Gavitt, Brendan, Xi, Haoran, Milner, Kimberly, Chen, Boyuan, Yin, Max, Garg, Siddharth, Krishnamurthy, Prashanth, Khorrami, Farshad, Karri, Ramesh, Shafique, Muhammad

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation.


SMOTEC: An Edge Computing Testbed for Adaptive Smart Mobility Experimentation

Nezami, Zeinab, Pournaras, Evangelos, Borzouie, Amir, Xu, Jie

arXiv.org Artificial Intelligence

Smart mobility becomes paramount for meeting net-zero targets. However, autonomous, self-driving and electric vehicles require more than ever before an efficient, resilient and trustworthy computational offloading backbone that expands throughout the edge-to-cloud continuum. Utilizing on-demand heterogeneous computational resources for smart mobility is challenging and often cost-ineffective. This paper introduces SMOTEC, a novel open-source testbed for adaptive smart mobility experimentation with edge computing. SMOTEC provides for the first time a modular end-to-end instrumentation for prototyping and optimizing placement of intelligence services on edge devices such as augmented reality and real-time traffic monitoring. SMOTEC supports a plug-and-play Docker container integration of the SUMO simulator for urban mobility, Raspberry Pi edge devices communicating via ZeroMQ and EPOS for an AI-based decentralized load balancing across edge-to-cloud. All components are orchestrated by the K3s lightweight Kubernetes. A proof-of-concept of self-optimized service placements for traffic monitoring from Munich demonstrates in practice the applicability and cost-effectiveness of SMOTEC.


How to Install Chat with GPT on Your Synology NAS – Marius Hosting

#artificialintelligence

Chat with GPT, abbreviation CWGPT, is an open-source, unofficial ChatGPT app with extra features and more ways to customize your experience. It connects ChatGPT with you own API Key and with an extra API Key from ElevenLabs to give ChatGPT a realistic human voice during the interaction. In this step by step guide I will show you how to install Chat with GPT on your Synology NAS with Docker. Please Support My work by Making a Donation. Follow the instructions in the image below.


The SLAM Hive Benchmarking Suite

Yang, Yuanyuan, Xu, Bowen, Li, Yinjie, Schwertfeger, Sören

arXiv.org Artificial Intelligence

Benchmarking Simultaneous Localization and Mapping (SLAM) algorithms is important to scientists and users of robotic systems alike. But through their many configuration options in hardware and software, SLAM systems feature a vast parameter space that scientists up to now were not able to explore. The proposed SLAM Hive Benchmarking Suite is able to analyze SLAM algorithms in 1000's of mapping runs, through its utilization of container technology and deployment in a cluster. This paper presents the architecture and open source implementation of SLAM Hive and compares it to existing efforts on SLAM evaluation. Furthermore, we highlight the function of SLAM Hive by exploring some open source algorithms on public datasets in terms of accuracy. We compare the algorithms against each other and evaluate how parameters effect not only accuracy but also CPU and memory usage. Through this we show that SLAM Hive can become an essential tool for proper comparisons and evaluations of SLAM algorithms and thus drive the scientific development in the research on SLAM.


Simplify deploying YOLOv5 to using new OctoML CLI

#artificialintelligence

Follow along with our new YOLOv5 deployment tutorial to power your next object detection application. Or, watch this tutorial video by Smitha Kolan on how to deploy YOLOV5 in under 15 minutes using the OctoML CLI. Today, we are excited to announce the results of our collaboration with Ultralytics to deploy the YOLOv5 models to over 100 CPU and GPU hardware targets in AWS, Azure and GCP. Our engineering work with Ultralytics unlocks the ability to deploy YOLOv5 models on hardware from Intel, NVIDIA, Arm and AWS, with minimal effort and cost. In this blog, I'll show you how simple it is to achieve hardware independence and cost savings across multiple clouds.